FRAME—Monte Carlo model for evaluation of the stable isotope mixing and fractionation

Bayesian stable isotope mixing models are widely used in geochemical and ecological studies for partitioning sources that contribute to various mixtures. However, none of the existing tools allows accounting for the influence of processes other than mixing, especially stable isotope fractionation. Bridging this gap, new software for the stable isotope Fractionation And Mixing Evaluation (FRAME) has been developed with a user-friendly graphical interface (malewick.github.io/frame). This calculation tool allows simultaneous sources partitioning and fractionation progress determination based on the stable isotope composition of sources/substrates and mixture/products. The mathematical algorithm applies the Markov-Chain Monte Carlo model to estimate the contribution of individual sources and processes, as well as the probability distributions of the calculated results. The performance of FRAME was comprehensively tested and practical applications of this modelling tool are presented with simple theoretical examples and stable isotope case studies for nitrates, nitrites, water and nitrous oxide. The open mathematical design, featuring custom distributions of source isotope signatures, allows for the implementation of additional processes that alternate the characteristics of the final mixture and its application for various range of studies.

• Input data are treated as a range of equally probable values instead of a single value with uncertainty. The introduction of such a feature has a number of practical consequences (discussed in Sec. ??), however, it also results with substantial changes in the mathematical formulation of the model. In particular, the likelihood function needs to be modified to account for the non-normal probability distribution of values characterising mixing components. The following section describes the details and derivation of the new, modified likelihood function.

A.1.1 Modified likelihood function
The standard form of the likelihood function is given as a Gaussian: with the width σ standing for combined uncertainty of model parameters (quadratic sum of uncertainties of isotopic signatures and auxiliary parameters), x stands for the measured isotopic signature of the sample and µ is the model output value, which most general definition can be written as: November 2, 2022 1/13 The variable µ can be rewritten as where α = ∂µ ∂S0 and µ does not depend on S 0 (linear dependence of µ on S 0 was assumed). Let us consider the isotopic signature of the source S 0 as a range of equally probable values, thus being represented as a flat probability distribution in the range of: (S 0 − ∆S 0 , S 0 + ∆S 0 ). The likelihood function with such a modification will be derived and then the equation will be generalized to account for a flat distribution of all S i sources. For convenience, the apostrophe in µ will be dropped and written simply as µ.
Since the source isotopic signature can be any value in the range (S 0 − ∆S 0 , S 0 + ∆S 0 ) the basic likelihood function (Eq. 1) will become an integral: 2σ 2 and the normalization can be calculated with: Finally we obtain: Such a function can be easily generalized to the finite flat distribution of all contributing sources: which can be expressed by invoking the original definition of x and αi (multiplicative constants were dropped, as absolute normalization is irrelevant): ∆i is dependent on fractions fi (and possibly other auxiliary variables), thus it must be evaluated in each MC iteration.  Figure S1. Comparison of gaussian-like and erf-like (f (x) and g(x) respectively) function shapes depending on the relation between ∆ and σ. Note that in the limit of vanishing ∆ functions g and f become close to equal, however function g is indefinite at ∆ = 0, so FRAME switches back to gaussian-like likelihood function in these cases. Figure S2. The main interface of the program. Figure S2 presents the main interface of the program, in which the input data is loaded and simualtion parameters are set. All the input files for the model should be prepared in .csv format. All the example input files used for simple examples (Section 3) and for the case studies (Section 5) described in the paper are accessible on the FRAME website (https://malewick.github.io/frame/).

Load samples
First, the .csv file with samples data should be uploaded. The stable isotope signatures of the samples should be introduced into the model by defining the following columns: • group-the following number for samples group representing common samples class or series of measurements. In other words, the group is a pool of repetition of the same sample and only one solution/result will be calculated for the group without calculation of individual fractions for each sample • label -identification of individual samples which is displayed on the graphs generated by the model.
• stable isotopic signatures in two columns for 2D model and three columns for 3D model -with the user-defined name as the heading (e. g., in plain text "d18O" or using unicode characters "δ 18 O") and δ-values in per-mill (‰) as entries for each sample.
• standard deviation of stable isotopic signatures in two columns for 2D model and three columns for 3D model -with heading as stdev(isotopic signature), e.g. stdev(δ 18 O), and the values of measurement uncertainties as entries for each sample. See examples: NO3data.csv, H2Odata.csv, NITRITEdata.csv, NO2data.csv 2. Load sources Second, the .csv file with sources data should be uploaded. The isotopic characteristic of the sources should be introduced into the model by defining the following columns: • 'source' -name or abbreviation of the source • isotopic signatures in two columns for 2D model and three columns for3D modelwith the name of isotopic signature as the heading, identical to the headings defined in samples file (e.g.δ 18 O) and mean isotopic δ values as entries for each source. In case of defining the range of equally probable values representing the source the mean value = (range max + range min)/2 should be given.
• spread of the isotopic signature -e.g. spread(δ 18 O) -the spread of the isotopic range of the particular source, i.e., if spread = (range max -range min)/2, the range is defined as mean value ± spread. All the values within this range have equal probability. In case of one well-determined mean value of the highest probability , the spread should be entered as zero.
• standard deviation of the isotopic signature -e.g. stdev(δ 18 O) -standard deviation representing the analytical uncertainty of the isotopic analysis of the source valuesthis uncertainty is added to the mean value ± spread range as a margin with Gaussian likelihood (see Figure S1). You can also save your current configuration into xml file, which can later be conveniently loaded to resume work (File→Save, File→Load).
Once you run the model you will see the output console informing you about the state of the calculation and three additional plotting canvases will appear (see Fig. S3).
• Figure 1. Time-series of entries accepted by the Metropolis-Hastings algortihm, that build the Markov chain. The burnout period is marked with a dashed lined.
• Figure 2. On the diagonal there are histograms showing distributions of evaluated variables. Panels above the diagonal show the correlation between the variables and panels below the diagonal show the same correlation, but evaluated as a single number.
• Figure 3. For each accepted set of variables the isotope mixture is evaluated and plotted as a path.
These plots serve the purpose of real-time quality assessment during the calculation and it is advised to always begin analyses by running a couple of test samples to see if everything works as expected. However, when running the analysis for multiple samples, it is advised to turn off the on-line plotting to reduce the computation time.
Also, when the model is run the FRAME interface switches to a new tab, which contains the output console and the progress bar to inform you about the status of the computation. Figure S3. The view of the running simulation.
When the run is finished the results are saved in the given directory in the folder (named with the combined input file names) as results.csv that contains the calculated final values for each unknown parameter (in the following columns) for each sample group (in the following rows). The following values are determined: -mean -median -standard deviation (given as 1 σ) -lim low -lower limit of 68% confidence level (calculated as shown in Sect. 2.3) -lim upupper limit of 68% confidence level (calculated as shown in Sect. 2.3) The 3 figures listed above are also saved in the output folder.
The interface can be practically tested using the files prepared for the example datasets presented in the Section ?? which are available at https://github.com/malewick/frame: -2D model for determination of nitrate sources and fractionation (??): sources (NO3 sources.csv), measured values of the mixture (NO3 data.csv), fractionation (NO3 frac.csv)

C More examples with FRAME C.1 Many indistinguishable sources (2D)
Here we show the example model outputs in case of too many indistinguishable sources are introduced into the model and it fails in finding the solution (Figs. (S4, S5, S6, S7).  Figure S8. Mixing configurations (f 1 , f 2 , f 3 ) from all model iterations, which fulfilled the Metropolis condition -the Markov chain. A number of initial iterations is discarded in order for the model to stabilize (burnout, left to the dashed line), reaching the region of maximum likelihood. In MCMC simulations it is crucial that the stabilization is indeed reached, therefore such plot is a key tool for quality assessment. In case the stabilization is not achieved, the plot would display large variations, exceeding the iteration-by-iteration fluctuations, as well as longer trends would appear, persisting for a > 1 number of iterations.  Figure S11. The candle plots illustrating the results of the simulation, comparing with the true values (drawn in red). The blue circle represents the mean, the horizontal line stands for median, the box shows the boundaries of the middle quartiles, and the whiskers enclose 95% of the distribution.